strategy selection
Monitor-Generate-Verify (MGV): Formalising Metacognitive Theory for Language Model Reasoning
Test-time reasoning architectures such as those following the Generate-Verify paradigm, where a model iteratively refines or verifies its own generated outputs, prioritise generation and verification but exclude the monitoring processes that determine when and how reasoning should begin. This omission may contribute to the prefix dominance trap, in which models commit early to suboptimal reasoning paths and seldom recover, yielding roughly 20% accuracy loss. We address this architectural gap by proposing the Monitor-Generate-Verify (MGV) framework, a computational translation of Flavell's and Nelson and Narens' metacognitive theories that preserves their psychological detail. MGV extends the Generate-Verify paradigm by adding explicit monitoring that captures metacognitive experiences (from difficulty assessments to confidence judgements) before generation begins and refines future monitoring through verification feedback. Though we present no empirical validation, MGV provides a vocabulary for diagnosing component-level failures in reasoning systems, suggests specific architectural interventions for future designs, and identifies connections to resource-rational analysis that may ground its mechanisms in normative principles.
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.45)
- Education (0.67)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Robust Multi-Agent Decision-Making in Finite-Population Games
Park, Shinkyu, Bezerra, Lucas C. D.
Abstract-- We study the robustness of an agent decision-making model in finite-population games, with a particular focus on the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model. Specifically, we examine how the model's parameters influence the impact of various sources of noise and modeling inaccuracies--factors commonly encountered in engineering applications of population games--on agents' decision-making. Our analysis provides insights into how these parameters can be effectively tuned to mitigate such effects. Theoretical results are supported by numerical examples and simulation studies that validate the analysis and illustrate practical strategies for parameter selection. The population game and evolutionary dynamics framework provides a powerful foundation for modeling and analyzing repeated strategic interactions among a population of decision-making agents [1].
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This very strong paper proposes a rational model for algorithm selection based on problem features and Bayesian regression. The model is shown to be effective computationally and to better predict human performance than comparable models. This paper is the epitome of a strong NIPS paper. The paper is clearly written and addresses an interesting problem. There is both a nice computational result about the algorithm and a cognitive model that is tested with a brief experiment.
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs
Qi, Shihao, Ma, Jie, Yin, Ziang, Zhang, Lingling, Zhang, Jian, Liu, Jun, Tian, Feng, Liu, Tongliang
Existing methods usually leverage a fixed strategy, such as natural language reasoning, code-augmented reasoning, tool-integrated reasoning, or ensemble-based reasoning, to guide Large Language Models (LLMs) to perform mathematical reasoning. Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency. To address these issues, we propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution. Specifically, we first curate a multi-strategy preference dataset, which we call MathStrat, capturing correctness, process quality, and computational efficiency for each problem--strategy pair. Then, we train a lightweight Strategy Adapter based on the dataset to obtain confidence distributions over the mentioned four reasoning strategies. At inference time, an adaptive routing policy dynamically tailors the reasoning approach based on predictor confidence. It directs the model to use single-strategy execution for high-confidence predictions, dual-strategy verification for competitive scenarios, or comprehensive multi-strategy exploration for uncertain cases. Extensive experiments across five mathematical reasoning benchmarks demonstrate that PRISM consistently outperforms individual strategies and ensemble baselines, achieving improvements ranging from 0.9% to 7.6% across different base models. The adaptive routing approach shows particularly strong benefits for mathematical reasoning tasks across diverse model architectures. Our code is released at https://github.com/reml-group/PRISM.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations
Zhou, Yougen, Chen, Qin, Zhou, Ningning, Zhou, Jie, Wu, Xingjiao, He, Liang
Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems
Zhang, Xinjie, Wang, Wenxuan, Jin, Qin
In emotional support conversations, unclear intentions can lead supporters to employ inappropriate strategies, inadvertently imposing their expectations or solutions on the seeker. Clearly defined intentions are essential for guiding both the supporter's motivations and the overall emotional support process. In this paper, we propose the Intention-centered Emotional Support Conversation (IntentionESC) framework, which defines the possible intentions of supporters in emotional support conversations, identifies key emotional state aspects for inferring these intentions, and maps them to appropriate support strategies. While Large Language Models (LLMs) excel in text generating, they fundamentally operate as probabilistic models trained on extensive datasets, lacking a true understanding of human thought processes and intentions. To address this limitation, we introduce the Intention Centric Chain-of-Thought (ICECoT) mechanism. ICECoT enables LLMs to mimic human reasoning by analyzing emotional states, inferring intentions, and selecting suitable support strategies, thereby generating more effective emotional support responses. To train the model with ICECoT and integrate expert knowledge, we design an automated annotation pipeline that produces high-quality training data. Furthermore, we develop a comprehensive evaluation scheme to assess emotional support efficacy and conduct extensive experiments to validate our framework. Our data and code are available at https://github.com/43zxj/IntentionESC_ICECoT.
- North America > United States > California > Los Angeles County > Beverly Hills (0.04)
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Beijing > Beijing (0.04)
DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models
He, Chengbo, Zou, Bochao, Xing, Junliang, Chen, Jiansheng, Shi, Yuanchun, Ma, Huimin
In human-AI collaboration, a central challenge is deciding whether the AI should handle a task, be deferred to a human expert, or be addressed through collaborative effort. Existing Learning to Defer approaches typically make binary choices between AI and humans, neglecting their complementary strengths. They also lack interpretability, a critical property in high-stakes scenarios where users must understand and, if necessary, correct the model's reasoning. To overcome these limitations, we propose Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models (DeCoDe), a concept-driven framework for human-AI collaboration. DeCoDe makes strategy decisions based on human-interpretable concept representations, enhancing transparency throughout the decision process. It supports three flexible modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity, selected via a gating network that takes concept-level inputs and is trained using a novel surrogate loss that balances accuracy and human effort. This approach enables instance-specific, interpretable, and adaptive human-AI collaboration. Experiments on real-world datasets demonstrate that DeCoDe significantly outperforms AI-only, human-only, and traditional deferral baselines, while maintaining strong robustness and interpretability even under noisy expert annotations.
- Asia > China > Beijing > Beijing (0.06)
- North America > United States > California (0.04)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.90)
Opinion-Driven Decision-Making for Multi-Robot Navigation through Narrow Corridors
Alghamdi, Norah K., Park, Shinkyu
-- We propose an opinion-driven navigation framework for multi-robot traversal through a narrow corridor . Our approach leverages a multi-agent decision-making model known as the Nonlinear Opinion Dynamics (NOD) to address the narrow corridor passage problem, formulated as a multi-robot navigation game. By integrating the NOD model with a multi-robot path planning algorithm, we demonstrate that the framework effectively reduces the likelihood of deadlocks during corridor traversal. T o ensure scalability with an increasing number of robots, we introduce a game reduction technique that enables efficient coordination in larger groups. Extensive simulation studies are conducted to validate the effectiveness of the proposed approach. In recent years, robots have become increasingly integrated into human environments, including residential areas, healthcare facilities, and public spaces. As robots interact frequently with both humans and other robots, the importance of social navigation has grown significantly. Social navigation focuses on optimizing a robot's behavior to enhance human comfort and improve the acceptability of robots in shared spaces. For instance, when multiple robots navigate through a narrow corridor, as depicted in Figure 1, they must dynamically adapt their movements in response to the actions of others, ensuring smooth and cooperative interactions within such constrained environments.
- Asia > Middle East > Saudi Arabia (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Algorithm selection by rational metareasoning as a model of human strategy selection
Falk Lieder, Dillon Plunkett, Jessica B. Hamrick, Stuart J. Russell, Nicholas Hay, Tom Griffiths
Selecting the right algorithm is an important problem in computer science, because the algorithm often has to exploit the structure of the input to be efficient. The human mind faces the same challenge. Therefore, solutions to the algorithm selection problem can inspire models of human strategy selection and vice versa. Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. We apply our theory to model how people choose between cognitive strategies and test its prediction in a behavioral experiment. We find that people quickly learn to adaptively choose between cognitive strategies. People's choices in our experiment are consistent with our model but inconsistent with previous theories of human strategy selection. Rational metareasoning appears to be a promising framework for reverse-engineering how people choose among cognitive strategies and translating the results into better solutions to the algorithm selection problem.
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- North America > United States > Kansas (0.04)
- Asia > Middle East > Jordan (0.04)
On the Reasoning Capacity of AI Models and How to Quantify It
Radha, Santosh Kumar, Goktas, Oktay
Recent advances in Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities. While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks, highlighting the need for more rigorous evaluation methodologies. We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior, establishing a framework that could broadly impact how we analyze and understand AI systems. Using positional bias in multiple-choice reasoning tasks as a case study, we demonstrate how systematic perturbations can reveal fundamental aspects of model decision-making. To analyze these behaviors, we develop two complementary phenomenological models: a Probabilistic Mixture Model (PMM) that decomposes model responses into reasoning, memorization, and guessing components and an Information-Theoretic Consistency (ITC) analysis that quantifies the relationship between model confidence and strategy selection. Through controlled experiments on reasoning benchmarks, we show that true reasoning remains challenging for current models, with apparent success often relying on sophisticated combinations of memorization and pattern matching rather than genuine logical deduction. More fundamentally, we demonstrate that accuracy alone often overstates a model's reasoning abilities, as model behavior can be characterized through underlying mechanisms in the phase space of cognitive strategies, revealing how models dynamically balance different approaches when responding to queries. This framework enables quantitative criteria for real-world deployments, allowing applications to specify reliability thresholds based on strategy distributions rather than aggregate performance metrics.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France (0.04)
- Research Report > Strength High (0.54)
- Research Report > Experimental Study (0.54)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)